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Abstract 

Using a maximum entropy principle to assign a statistical weight 
to any graph, we introduce a model of random graphs with arbitrary 
degree distribution in the framework of standard statistical mechan- 
ics. We compute the free energy and the distribution of connected 
components. We determine the size of the percolation cluster above 
the percolation threshold. The conditional degree distribution on the 
percolation cluster is also given. We briefly present the analogous 
discussion for oriented graphs, giving for example the percolation cri- 
terion. 

1 Introduction 

The statistical properties of networks, either biological, social or techno- 
logical, have received a lot of attention recently both experimentally and 
theoretically, See eg. refs.0, |J. One of the most studied features of those 
networks is the degree distribution, which describes the probability for the 
vertices to have 0, 1, • • • neighbors. One striking observation is that, in many 
examples, the degree distribution is large so that the probability to have n 
neighbors decreases slowly with n. Several models (static or evolving) predict 
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such a behavior. More generally, they contain enough tunable parameters to 
reproduce almost any degree distribution. 

However, the static models are in general not conveniently defined within 
the language of statistical mechanics (see ref . 0] , which motivated our interest 
in this question). This is for instance the case with the most intuitive proposal 
0: generate independently half edges for each vertex, with the appropriate 
distribution, and then join the half edges at random. This makes it rather 
easy to generate random graphs, but does not assign in a simple way a 
probability to any given simple graph : it is formally complicated to eliminate 
multiple edges. Another proposal made in |3J] has some formal technical 
similarity with our work but really leads to a different model. 

It is moreover obvious, if not always apparent in the literature, that the 
knowledge of the degree distribution leaves many statistical properties of the 
graphs undetermined, even if one insists that all vertices are equivalent. This 
arbitrariness is a problem, because most of the time the models used to fit the 
behavior of say a communication network are just ingenious constructions : 
they are not derived from clear basic principles. Such principles may be out 
of our reach at the moment, and so is a classification of all random graph 
models with certain apriori properties. Consequently, we propose to use 
maximum entropy as a criterion to build a model that does not make any a 
priori bias, incorporating what we know - in this case the degree distribution 
- but nothing else. Comparison with real networks is a way to get evidence 
for other striking features that might be overlooked today. 

The maximal entropy principle is applied here to deal with constraints 
on the degree distribution but it can clearly be engineered to deal with other 
constraints. 

This paper is organized as follows : 

- Section 2 starts with the main definitions, goes on with a quick re- 
minder on the Molloy-Reed model Jl| and continues with the definition of 
the maximum entropy model. We use it to reformulate the standard Erdos- 
Renyi random graph model. Then we derive a few general identities valid for 
the maximal entropy model, and study the distribution of connected compo- 
nents. Our model is a close cousin of the Molloy-Reed model and we make 
the connexion precise below. Finally we discuss the possibility of numerical 
simulations. 

- Section 3 studies the thermodynamical limit when the number N of 
sites is large, but the number of edges scales like N, hence the name finite 
connectivity limit for this regime. We derive the equations that determine 
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all physical quantities in this regime : free energy, distribution of the number 
of edges incident at a vertex, ... We then study the connected components, 
derive the criterion for the existence of a percolation cluster and the formula 
for its size. Finally, we study the distribution of the number of edges incident 
at a vertex in the percolation cluster. 

- Section 4 analyzes the generalization to oriented graphs, ending with 
the criterion for the existence of a percolation cluster and the formula for its 
size. 

2 The model 

2.1 General definitions 

In the following, we shall concentrate on labeled simple unoriented graphs, 
or equivalently on symmetric — 1 matrices, with vanishing main diagonal : 
the matrix element is 1 if vertices % and j are connected by an edge and 
else. So we use the same letter G to denote the graph and its adjacency 
matrix with matrix elements G it j. In the sequel, unless otherwise stated, the 
term graph refers to labeled simple unoriented graph. The number of edges 
of a graph G is denoted by E(G) and the number of vertices by V(G). 

The row-sum G{ = J2j Gi,j is the number of neighbors of site i. The 
degree distribution of G is the sequence Gk = such that Gi = k}, so 
that Go is the number of isolated points of G, G\ is the number of vertices 
of G with exactly one neighbor, and so on. 

Not every integer sequence can appear as the degree distribution of a 
graph on iV vertices : Gk = for k > N, J2k Gk = N and J2k kGk is even 
because this number counts twice the number of edges of G, i.e. J2i,jGi,j- 
There are other less obvious constraints. We call the sequences that appear 
as degree distribution of a graph on N vertices iV-admissible. There is a rela- 
tively simple family of inequalities that characterizes iV-admissible sequences, 
but for instance the (asymptotic) counting of iV-admissible sequences is still 
unknown. 

2.2 The Molloy-Reed model 

Before we introduce our model, let us describe the method of Molloy and 
Reed |l| which can be interpreted as a kind of microcanonical version of our 
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model. The idea is quite elegant : for any integer N fix an ^-admissible 
sequence {m N ^} k > and take as probability space the set Q{m Nk } of graphs 
with degree distribution {mN,k}k>o, endowed with the uniform (counting) 
probability. By construction, in Q{ mN k y, the probability that vertex i G [1, N] 
has k neighbors is exactly rriN,k/N. 

Molloy and Reed show that if the sequence {m^^/N} converges (uni- 
formly) to a probability distribution {n^^iJ^k^k — 1) ; under one technical 
assumption, the space G{m Nk } converges in an appropriate sense to a random 
graph ensemble G{ nk } on which standard questions can be formulated and 
answered : 

- the probability in Q^,,} that a given vertex has k neighbors is - not 
surprisingly - ir k , 

- Molloy and Reed give a criterion for the presence or absence of a giant 
component. 

Heuristic arguments show that the 'intuitive' model (which does not in 
general lead to simple graphs) namely "generate independently half edges for 
each vertex, with distribution {n^.} and then join the half edges at random", 
has the same thermodynamical - large N - properties as the Molloy-Reed 
model. 

2.3 The maximum entropy model 

To start with, we fix an integer N > 1, and a probability distribution {tt^^} 
(J2 k =i ftN,k — !)• We want to look for a probability distribution {pc} on the 
set of graphs on N vertices such that for any vertex i, J^g- d l =kPc = ^N,k 
where, here and below, the notation means that the sum is restricted to 
graphs such that Gi, the number of neighbors of vertex i in G, equals k. 
With words, we look for a probability distribution {pc} on the set of graphs 
on N vertices such that the probability that vertex i has k neighbors is 
7Tjv,&- As explained in the introduction, this requirement is far from fixing 
the probability distribution. 

We also want this probability distribution to have no other bias. So we 
look for a distribution {pg} with maximal entropy []. 

1 Notice that, if no constraint is imposed, the uniform counting measure has maximal 
entropy. This measure can be described as follows: the probability of an edge between 
vertices i and j is 1/2 independently of the presence or absence of any other edge. 
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Hence we want to maximize J2gPg^°SPg under the constraints 

VG = TTN,k 

G;G l =k 

which we implement as Lagrange multipliers. The extremum conditions for 
J2PG^ogp G + 1) - X(J2pg ~ 1) ~J2Kk( Yl PG-n N ,k) 

G G h k G;Gi=k 

are 

Pg = e A+ ^ A *A, J2pg = 1 and J2 Pc = ^N,k- 

G G;G % =k 

It is not obvious to us that these equations always have a solution and 
that this solution is unique and symmetric i.e. does not depend on i 0. 
But as usual in statistical mechanics, we can reverse the logic : we start from 
an arbitrary sequence of positive numbers tk = e Xk and define pg'- 

Pg = e A+ ^ <A(5 « = e x *[[tf k (1) 

k 

with suitably adjusted A so as to ensure J2gPg — 1- 
We define the weight Wq of a graph as 

k i 

and the partition function as the sum of weights 

z N ^Y^G = YWt- 

G G k 

Hence A = — logZjv and pc = wg/Zn. 

By construction the probability distribution 7Tjvfc that vertex i has k 
neighbors is i- independent. Recall that Gk is the number of vertices with 
k neighbors in G so that 

N 7Tjv,fc =Y Y Pg = J2 &kPG = -y- Y GkW G - 



■ G-.G. A G Z N g 



But GkWQ = dwa/OXk so 



1 d\ogZ N 1 d\ogZ N 
KN ' k = N d\k = N k dtk ' 



2 In the large N limit, some spontaneous symmetry breaking might occur. We shall not 
pursue this questions here. 



5 



2.4 The Erdos-Renyi model revisited 

As a first application, but also as a preparation to section ^, let us rein- 
terpret the standard Erdos-Renyi random graph model in our frame- 
work. Recall that in the Erdos-Renyi model, the edges are described by 
independent binomial variables, each edge being drawn with probability p. 
Recall that E(G) denotes the number of edges of a graph G. The probabil- 
ity of the graph G is simply ^^(1 — p) Ar ( Ar_1 )/ 2 ~ £; ( G ) which we rewrite as 

(1 _ p) W-W (^) E(G) . Now 225(G) = E k kG k . So letting t k = (^f" 
shows that the Erdos-Renyi model is the maximal entropy model such that 
the probability that a vertex has k neighbors is ( N ~ 1 )p k (l-p) N - 1 - k . The 
average number of neighbors is (N — l)p. In the large N limit, an interesting 
regime occurs when this number is kept fixed, so that p ~ a/N and a is the 
control parameter. The important observation is that if p scales like TV -1 , 
the parameters tk scale like N~ k / 2 . In section |3|, we shall see that indeed 
generically these scaling relations ensure that log Zjy scales like N, as any 
"good" free energy should. 

2.5 Useful relations 

We establish a few formula which will be central in the following discussion. 

The sequence {Zn}n>i satisfies a first order functional recursion rela- 
tion that will prove useful in the subsequent analysis. We define the formal 
Laurent series H(u, t , ■ ■ ■ , tk, • • •) = J2k t k 0J~ k . 

Suppose N > 2. Then Z N is the constant (i.e. of degree 0) term in the 
u;-expansion of the product 

H(lo, to, • ■ ■ , tk, ■ ■ -)Z N _i(t + uiti, ■ ■ ■ ,tj + utj+i, ■ ■ ■)• 

The w-expansion of the product is well-defined because both factors involve 
at most a finite number of terms of positive degree. 

The proof of this relation goes as follows. If G' is a graph on iV - 1 
vertices 1, • • • , N — 1, it can be completed to a graph G on N vertices in the 
following ways : add vertex iV and k — 0, • • • , TV — 1 edges emerging from 
N. Attach these edges to any k distinct vertices of G' . There is a simple 
relation between the weights of G and G' because one vertex of degree k has 
been added (this is taken care of by the term tk in H), and k vertices in 
1, • • • , N — 1 have seen their degree increased by 1 so, in wc = Hi 1 tg/, k 
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of the factors tj are replaced by tj + i (this is taken care of by replacing all t/s 
in Zjf-i by tj +utj + i and expanding to order k in ui). Note that the relation 
is also true for N = 1 if we make the natural choice Z — 1. 
We rewrite this result as a (formal) contour integral f\ : 

Zn^o, ■ ■ ■ , tj, ■ ■ ■) = £ ^2 tk^ k ZN~l(t + Uti, ■ ■ ■ ,tj + U>tj + i, ■ ■ ■). 

J U k 

The same argument, based on enumerating the ways the point iV can be 
linked to the remaining part of the graph under the condition that Gn = k, 
shows that 

ZN^N,k = tk <f k Zjsi^i{tQ + Uti, " ' ,tj + LUtj + i } ■ ■ •), 

to be compared with the formula of Section |2.jj| . These two formulae for 7Tjv,fc 
are not so trivially equivalent because they involve different rearrangements 
of the sum of weights. 



2.6 Component distribution 

We study the distribution of sizes of connected components. 
Define W n by 

W n = J2° w g, 

G;V(G)=n 

where denotes the sum over connected graphs. Observe that if G splits 
as a disjoint union of two subgraphs G\ and G2 (G contains no edge joining 
a vertex of G\ to a vertex of G 2 ), the weight of G factorizes : wc = Wq-^Wg^- 
So the total weight of graphs G of size N that are the disjoint union of k\ 
connected components of size 1 (i.e. isolated points), k 2 connected compo- 
nents of size 2,- • -,k n connected components of size n,- ■ ■ (so Y^ n >i n ^n = N) 
is 

— J]W kn 

The combinatorial factor just counts the number of ways to split the N 
vertices of G in packets of the right size. Summing over all possible fc n 's 



3 The symbol <f denotes the contour integral ^ J along small contour surrounding the 
origin. 



7 



gives back Z N 



k n >0, J2n>i nkn=N rr 



This formula allows to view Zn not as a function of the t^s but as a function 
of the WVs, and using this interpretation, we see that, denoting by C m (G) 
the number of connected components of size m in the graph G, the average 
number of components of size m in the random graph model is 



d log Z N 



dW m 

So mW m 91 Q^ ZN is the average number of sites belonging to components of 
size m, and summing over m we should have J2 m ^Wm^r 4 = N. This is 
simply the statement that Zn is a homogeneous function of degree N in the 
W n 's if W n is assigned degree n. 

This can be rephrased in compact form. Introduce a (complex or formal)^ 
variable z and define Z = J2n>o ~nT^ n > ^ ne -^-generating function for the 
Zat's. Replacing Z n by its expression in terms of the W^s, we get the (well- 
known) fact that Z = e w where W = Y, n >i ^-jWn. Conversely, one retrieves 
Z N by 

Z N = N\f—z- N e^& w \ (2) 
The average number of components of size n in the random graph is thus 

W dl <* Z » = Nl W ^ 
" dW n n\(N-n)\ n Z N ' 

Similarly, the average number of times a given graph g of size n < N 
appears as a connected component in the random graph G of size iV is 

N\ Z N - n 



n\(N-n)\ y Z N 



4 In this section, some of the computations we make require that the t^'s satisfy some 
properties so as to ensure that the series we write have a finite domain of convergence. 
For instance, we could assume that only a finite (though arbitrarily large) number of t^s 
are non vanishing. Alternatively we could work with formal power series. 
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2.7 Discussion 

A crucial observation is that the weight of a graph depends only on its degree 
distribution, as in the Molloy-Reed model. But whereas in the Molloy-Reed 
model the weight of a graph is unless it has the correct degree distribution, 
the degree distribution fluctuates in our model. So our model is a canonical 
description of a random graph model with given "number of edges distribu- 
tion at a vertex", and the Molloy-Reed model a microcanonical one. 

That the two models turn out to be equivalent in some large N limit is 
maybe not surprising. However, note that contrary to standard statistical 
mechanics (when only a few quantities, for instance energy and number of 
particles, fluctuate in the canonical description but are fixed in the micro- 
canonical one) the constraint hypersurface of the microcanonical model has 
a codimension that gets larger and larger as N grows. 

Finally, let us observe that the maximum entropy model is well suited 
for standard thermodynamical simulations, namely heatbath algorithms or 
metroplolis algorithms. This is because contrary to the 'naive' model or the 
Molloy-Reed model the phase space has a simple structure. 

3 Finite connectivity limit 
3.1 General analysis 

As suggested at the end of section |2l| by the special case of the Erdos-Renyi 
model, we shall show that a thermodynamic limit occurs in the large N limit 
if tk scales like N~ k ^ 2 . Note that in this case Wg, the weight of G scales like 
N~ E ( G \ where as before E(G) stands for the number of edges of G. 

The starting point of the analysis will be the functional equation estab- 
lished in section [T[5] : 

Zn^O, ■ ■ ■ , tj, • ■ •) — <f ^2 k ZN-l(to + Ujt\, ■ ■ ■ ,tj + U)tj+i, ■ ■ ■). 



We set t k = T k N~ k/2 and define F n (t) = ± log Z N (t.). Substituting uN' 1 / 2 
for oj leads after a few manipulations to 



k 
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This equation still involves no approximation. Now we make the usual 
thermodynamical hypothesis, namely that NF^(t) — (JV — 1)F/v-i(t) has 
a limit, say F(t) when iV — > oo. This implies in particular that Fn(t) 
converges to F(t). The above equation has then a large N limit. To see it 
clearly, we rewrite it as 



e NF N {r)-{N-l)F N ^(T) = i'^y^ 7¥t; -* e (iV-l)[F W _i((T.+^T. + l )ll-l .Vl l ,It 

1 " k 



In the large N limit, this leads to the equation 



1= ^E r fc|[' ( 3 ) 



where we have defined 



-F-iy.h^M. ^ OF 

= p 2 L^lk *Br k = \ /TV , j 



On can take an analogous limit of other relations in |2]5| to obtain the more 
detailed equations for the degree distribution, 

d F x 

nk = Tk d7 k = m W (4) 

Eq. (^) ensures that this distribution is correctly normalized, J2k — 1- 

The parameter x posseses a simple interpretation. We start from the 
relation = yr^, multiply it by XTk+i and sum over k to get 

^ = ^E r w-7p = E b *' ( 5 ) 

k ' k 

so that x 2 is the first moment of the distribution iik for the number of edges 
incident at a vertex. 

We can summarize quite compactly our results as follows: 
Let us introduce the function V(x) = J2k T kjr, which we call the potential 
for reasons which will be clear in a moment. If all our previous formulae are 
to make sense, this function should have a positive radius of convergence. 
Let us also define 

F{y,x) = -l-logy- — +yV{x). (6) 
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Then (y, x) is a critical point for T, thanks to eqs.(|||5|), and F is the corre- 
sponding critical value. 

It is not true that these equations for (y, x) always have a single solution. 
It is not difficult to find examples with no solution at all. We can interpret 
this by saying that in that case there is no thermodynamic limit in our 
sense. More troublesome is the case when there are several solutions. The 
most naive requirement would be that the physical solution is to take the 
couple (y, x) that leads to the absolute maximum F max for F because the 
factor e NFmax will be the dominant contribution to Z. We shall meet such a 



behaviour in one of the examples of Section |3.6| , and make a few comments 
there. 

For most of the paper, we shall simply assume that if there is more than 
one extremum, we have picked the correct one. 



3.2 Connected components 

In section p.6| , we gave a formula for in terms of connected components. 
This formula has also an interesting limiting form in the thermodynamic 
limit, but we shall wait until the next section to derive it. For the time 
being, recall that W n is the sum of the weights of connected graphs on n 
vertices. We have shown that the average number of components of size n in 
the random graph is w!( ^l w) , W n ^f-. 

Now, we split W n = J2i>o W n j as a sum of contributions corresponding to 
connected graphs with I — 0, 1, • • ■ (independent) loopsQ . If G is a connected 
graph with L (independent) loops, E edges and V vertices, an old theorem 
of Euler says that L = E — V + 1 (in particular trees, i.e. connected graphs 
without loops, have E = V— 1) so W n j is simply the homogeneous component 
of degree 2(n + l — 1) in the t' k s. If we set = TkN~ k / 2 we see that W n j(t) = 
N'-^Wn^r). We define T n = W nfl {r). 

When iV — > oo for fixed n and r^'s we find that u W„ ~ N^k, 

n](N—n)\ " n\ ' 

meaning that trees dominate. In the thermodynamic limit, we find as before 
that *g=*(t.) ~ e' n{F+ ^ kTh ^ = y n - 

So in the thermodynamic limit, the average number of components of size 
n in the random graph is N^fy n . The number of points in components of 

5 Or closed circuits in the mathematical literature. 
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size n in the random graph is 



C n = Nn^r, 

and the total fraction of sites occupied by finite components is 

If this number is 1, we can consistently interpret the random graph model 
as a random forest model in the thermodynamical limit. However, if this 
number is < 1, this means that a finite fraction of points is not in finite 
components, and there is a percolation cluster in the system. 



3.3 Tree distribution 

We would like to find a closed formula for the generating function 

The first observation comes from an analogy with a baby quantum field 
theory. The asymptotic expansion of the integral 

1 /" +00 D {-x 2 /2+yV(x))/h 



in powers of the r^'s has a useful reinterpretation. Namely 

G ^V^J k 

where the sum is over Feynman graphsf] with an arbitrary number of vertices, 
A{G) is the order of the automorphism group of G (for a precise definition 
see e.g. 0) and C(G) the number of connected components of G. Again by 
a factorization argument for the weights the connected contributions expo- 
nentiate, and 

G A ^ Ljr ) k 

Warning : Feynman graphs are essentially general graphs i.e. not necessarily simple ! 
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In the classical (small H) limit, on the one hand graphs with L(G) = domi- 
nate. Though Feynman graphs are not necessarily simple, loopless Feynman 
graphs are just ordinary trees. On the other hand, / can be calculated in 
the limit H small by the saddle point approximation, leading to the identity 
between formal power series: 

T(y)= E C ^G)U(m) 6k = S(x) (7) 

loopless G \ / k 

with S(x) = —x 2 /2+yV(x) = —x 2 /2+yJ2k TkX k /k\ and x is the formal power 
series of y and the Tfc's for which S is extremal, x = y J2k T k+ix k / k\ = yV'(x). 

Hence, the expansion of —x 2 /2 + yV(x) in a formal power series of y 
with x = yV'(x) yields T, n y n Tn/n\. From T(y) = -x 2 /2 + yV(x) and the 
stationnarity condition, we also infer that T'(y) = V(x). 

The expansion of T(y) is convergent for small y if V(x) has a non van- 
ishing radius of convergence. Note that if T\ = 0, the solution x = has to 
be chosen, because it leads to the correct T(y) = r y (trees on two or more 
vertices have leaves, so they count if tl = 0). 

We now make the general assumption that T\ ^ 0, and V(x) has a non- 
vanishing radius of convergence. Let us study the inversion of the relation 
y = x/V'{x). This can be obtained via the Lagrange formula. By Cauchy's 
residue formula 

x(y) = <b x — — — - — — ax 

yy ' 7 x/V'{x) -x/V'(x) 

where the x contour has index 1 with respect to x. Replacing x/V'(x) by y, 
the ^/-expansion yields 

i^J J (x/v(x)Y^ 

One can use integration by parts to get: 



x 



i Th ! \ dx \ doc i : • 

n>± \ \ / / \x=0 



This is clearly a series with non-negative coefficients. As a consequence, its 
radius of convergence is given by the first singularity on the positive real axis, 
at the point y m = x m /V'(x m ) corresponding to the unique maximal value of 
the concave function x/V'(x). This maximum might have two origins : either 
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x m is a singular point of V, or the derivative of x/V'(x), which is positive 
for small x, vanishes at x m . This is equivalent to V'{x m ) — x m V"(x m ) = 

The explicit form of T n can similarly be obtained via the Lagrange for- 
mula: 

. * „ y n ( d n ~ 2 

n>\ \ \ / / \x=0 

a classical formula which can also be proved by purely combinatorial argu- 
ments, giving an independent argument for the fact that the classical limit 
of quantum field theory is described by trees. 




3.4 Percolation 

The analysis of the previous section shows that the series 

Y^nix/V'ix^Tjnl 

n 

converges for any positive x in the domain of convergence of V, and that, in 
this domain, its sum is equal to V(x) for x < x m . 

For x > x m the series is still convergent. However there is a (unique) 
number x* < x m such that x*/V'(x*) = x/V'(x) and, since the series only 
involves the ratio x/V'(x), we have: 

^nix/V'ix^^Tn/nl = V(x*) < V(x), for x > x m . 

n 

The percolation question can be rephrased as follows. Is the relevant so- 
lution x of the system yV'{x) = x, yV{x) = 1 such that x < x m or not? 
Indeed, we know that the fraction of points in finite clusters is Y,n n ^fy n - 
Substituting y = x/V'(x), this series sums to yV(x) — 1 if x < x m but to 
yV(x*{x)) < 1 if x > x m . The condition x > x m is equivalent to the condition 
V'(x) — xV"(x) > 0. This can be transcribed in term of the probability dis- 
tribution {n k } : yxV'ix) = J^k^k = (k) and yx 2 V"{x) = J2kk(k — l)n k = 
(k(k — 1)) (we use brackets to denote averages of the distribution {vr fc }). 

Thus, the percolation criterion is that there is a percolation cluster in the 
system if and only if (2k — k 2 ) < 0. This is precisely the criterion given in 
ref.[|J. The relative size of the giant component = 1 — J2n n ^fy n is then: 

Q oc = l- yV(x*) = 1 - E K k (x*/x) k (8) 

k 
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where x* is the smallest x solution of y = x/V'(x). This is again in agreement 
with the result of ref. Close to the percolation threshold, and for a generic 
potential V, the size of the giant component increases linearly with (k 2 — 2k): 

^ (2k)(k 2 -2k) 
" (jfe(ife-l)(jfe-2)) 

This formula is not valid when the probability distribution iik has no third 
moment. Then the grows of the giant component close to the transition 
can exhibit a different critical behavior. We shall give an example of this 
situation in the examples of section |3.6| . 

Let us analyse in more details what happens if x = x m . We know that 
there is no percolation cluster. Now, if the radius of convergence of V is 
strictly larger than x m , close to y m = x m /V'(x m ), T'{y) has a square root 
branch point. This implies that the contribution of points in components 
of size n in the system decreases algebraically as C n ~ Nn~ 3 ^ 2 for large n. 
In the physics language, this is interpreted as a critical point and 3/2 as a 
critical exponent. Note that even in this case, the distribution {ilk} is still 
decreasing at least exponentially at large k. 

To observe other critical points, with different critical exponents, the 
radius of convergence of V has to be exactly x = x m , which requires some 
fine tuning. In that case, both C n and decrease algebraically. Assume 
that V has a leading singularity at x = x m locally of the form {x — x) 7 , 
with 7 > 2 to ensure the existence of (k) and (k 2 ). Generically, y — y m is 
linear in x — x m so that yT'(y) = V(x) has a leading singularity of the form 
(y — ymT 1 and both iik and Ck/N = kTky k jk\ decrease as A; -7-1 . We shall 
give an example below. 

If there is no percolation cluster, we can treat the large iV limit from 
another point of view. We start from eq.(0) and in the contour integral 
giving Zn, we change variables and replace z — > Nz, leading to 

z N - — i^ 2 -V^ r '^^ lW 

N ~ N N J z ~ 

For fixed n, the connected graphs with loops (I > 1) are suppressed by inverse 
powers of N. However, in the sum over the size of connected components, 
terms up to n = N make a contribution to the contour integral, and it 
might happen that for large n and N related by some condition connected 
components of size n of with loops make a finite contributions to ■h logZ^- 
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However, if there is no percolation cluster, we may safely neglect I > 1 and 
get an accurate approximation to the leading exponential behavior of in 
the large N limit. 

Under appropriate conditions, the contour integral for Zn can be de- 
formed to pass through a dominant saddle point. Then the free energy 
is given by the saddle point approximation. We see that Z^ ~ e NF (. T -) 
with F(t) = — 1 — logz + T(z), z being the the saddle point maximizing 
— 1 — log z + T(z). This equation is what one gets from eq.(|[) when y = z 
and x is seen as a function of y = x/V'(x). This gives yet another proof of 
the dominance of trees and the Lagrange inversion formula. 



3.5 Conditional degree distributions 

We now present formulas for the degree distributions, denoted 7Tfc , for ver- 
tices within clusters of size n. We are particularly interested in the degree 
distribution in the giant component when it exists. 

From the last formula of Section [2.6| , the average number of vertices of 
degree k belonging to a component of size n is: 

Cn{k) ~ n\(N-n)\ [tk ~dh ] 
In the thermodynamic limit, = N~ k / 2 T) C , N — > oo, this becomes 

N n\^ k dT k h 

By definition, the degree distribution within components of size n is C n (k) 
divided by the average number of points in components of size n so that tt^ = 
C n (k)/C n , with C n /N = nT n y n /n\ in the thermodynamic limit. Hence: 



(n) 1 <91ogX„ 
TTfc = ~ T k^ • 

n or k 

Notice that these distributions are normalized, J2k ^if 1 = 1) since the T n 's are 
homogeneous polynomials in the r, of degree n if each is assigned degree 
one. 

Assume now that the percolation criterion is satisfied so that a giant 
component exists. The number of vertices of degree k in the giant component 
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are: C^k) = Nir k — ^ n C n (k). In the thermodynamic limit, 



C OQ (k)/N = n k -J2(r k d Tk T n ) 



n! 



TTfe - (r k d Tk T)(y) 



n 



But T{y) = —x/2 + yV(x) with the extremum condition x = yV'(x) so that 
(d Tk T)(y) = y{x) /k\. Using n k = yr k jj, we get, with x* defined as in Section 



since 7rjr° is t ne ratio between C^^k) and the number of points in the giant 
cluster, which is NQ^. As it should, is correctly normalized: J2k = 
1, and vanishes at k = (there is no isolated vertex in the giant component). 
There is a crossover value k c = log(x/x*) _1 above which 7rjf°' ) is exponentially 
close to n k /Qoo. Close to the transition the ratio ir^/iTk goes to k/(k). 
The formula for 7r^,°°^ has a simple probabilistic interpretation : as it is the 
conditional probability that a vertex has k neighbors given that it is in the 
giant component, it can be written as the quotient of 7Tfc j00 , the probability 
to have k neighbors and be in the percolation cluster, by Qoo- We read from 

eq.(^|) that 7Tfc i00 = TT k — 7r fc(^") • Hence 7Tfcfijr) is the probability for a 
vertex to have k neighbors and to be in a finite component. This suggests 
that when a new point is added to the graph, the probability that it connects 

to k other vertices none of them in the giant component is 71"/- : for each 

new edge, the penalty for avoiding the giant component is ^L- 

3.6 Reconstruction, with examples 

The maximal entropy graph distribution can be reconstructed form the data 
of the degree distribution TT k , Hk^k = 1- We set H(s) = Y^k 7I 'kS k . 

Given 7i k , x is defined as the positive square root of (k) = J^kk^k, and 
yr k as 7i k k\/x k . This yields yV(x) = J2k n k(x/x) k = H(x/x). The coefficient 
y appears then as a normalization factor which may be choosen at will, eg. 
we could set y — 1. 




or equivalently, 




(9) 
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The tree distribution T(y) = Ylm^uD™ I n ^ is then reconstructed, as a for- 
mal series, from T'(y) = V(x) with x = yV'(x). It is clear that T n y n is 
independent of the choosen normalization for y. 

The fraction of site occupied by the finite size components is Q = yT'(y). 
By construction, x is solution to x/V'{x) = y. The giant component exists 
when there are two solutions to the above equation in the interval [0,x]. 
We denote by x* the smallest of them. The fraction of site occupied by 
the giant component is Qoo = 1 — yV(x*). Equivalently, one can look for 
a solution < s* < 1 to the equation H'(s) = sH'(l), and if there is one, 
Q OCJ = l-H(s*). 

Let us illustrate this reconstruction on a few simple examples. 

1. Poissonian degree distribution: n k = e~ a a k /k\. This is the Erdos- 
Renyi model. We have x = a 1 / 2 , and V(x) = expxx, choosing y = 
e~ a . The tree distribution is T'(y) = expx with xe~ x = ay. The 
giant component exists for a > 1 when the equation xe~ x = ae~ a 
admits two solutions a* and a with a* < 1 < a. Its relative size is 

= l _ e - a T'(e~ a ) = 1 - a* /a. 

2. Geometric degree distribution: Tik = (1 —p)p k . Then x 2 = p/(l — p) 
and yV(x) = 1/(1 + x 2 — xx). The extremum relation x = yV'(x) is a 
cubic equation: yx(l + x 2 — xx) 2 = yx. The percolation transition is 
at a; 2 = 1/2 (p = 1/3) and the relative size of the giant component is 
Qoo = 1 - (x*/xy/ 2 with xx* = \{x 2 + 2 - x^/A + x 2 ). 

This example confronts us with the ambiguity problem alluded to a 
long time ago. 

Changing p into 1—p leads to replace a; by \ jx. This changes yV(x) = 
1/(1 + x 2 — xx) into itself up to an irrelevant multiplicative factor. 
To state things in a slightly different way, the extremum conditions 
yV'(x) = x and yV(x) = 1 have two solutions, and one leads to the 
geometric distribution with parameter p and the other one to the geo- 
metric distribution with parameter 1 — p. Of course, the real result of 
the computation of will make a definite choice. The criterion of the 
maximum for F leads to choose inf (p, 1—p), and this is also consistent 
with continuity starting from p = 0, a random graph made of isolated 
points. 

However, the formulas obtained before for, say, the size of the giant 
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component, coincide with the ones from ref.jlj] even for p > 1/2. This 
situation requires clarification. Maybe this is the point when the canon- 
ical and the microcanonical approaches finally diverge and stop being 
equivalent. 

3. An example of a scale free distribution: H(s) = J2 n kS k = r o + r i s + 
r 2 s 2 /2 + t(1 - (3s + 13(13 - l)s 2 /2 - (1 - s) p ), where 2 < (3 < 3 and 
T o, T i, T 2 and r are nonnegative parameters subject to the condition 
t + T\s + r 2 s 2 /2 + t((3 — l)(f3 — 2)/2 = 1 to ensure that the 7r fc 's are 
correctly normalized. The n^s decrease like ~ t{-0) ^ , ~^~ 1 ' 

Then x 2 = H'(s = 1) = T\ + r 2 + r/5(/3 — 2) from which the potential 
yV(x) is recovered as usual. 

There is a percolation cluster if and only if the equation H'(s) = sH'(l) 
has a solution s* < 1. So, we look for the solutions of (1 — s) (r(3 — t%) = 
rf3(l — s)^ 1 . If (k 2 — 2k) = t/3 — t\ is negative, there is no percolation 
cluster, but if it is positive, 1 — s* — (1 — ^) 1 ^ /3 ~ 2 - ) . The size of the 
giant component is 1 — H(s*) ~ (1 — s*)H'(l), and 

Q O0 ~(k 2 -2k) 1 ^ 

close to the threshold. This is an example when the growth of the giant 
component close to the threshold is nonlinear as a function of (k 2 — 2k). 

The number of points in components of size k is reconstructed from 
T'{y) = V(x) with x = yV'(x). Below the threshold, this leads to a 
singularity T' sing ~ (y — y) 13 , which implies that C n ~ rT^ 1 . Above the 
threshold, the radius of convergence r of T is larger than y, leading to 
CnS that decrease exponentially as C n ~ n~ 3 ^ 2 (y/r) n . 



4 The case of oriented graphs 

It is not difficult to modify the previous arguments to deal with maximum 
entropy oriented graphs with given "in-out" degree distributions. We give 
the percolation criterion, omitting all details. 

The first result is that for such models, each vertex with k outgoing and 
/ incoming vertices contributes a fixed multiplicative factor, say t^i, to the 
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weight of a graph. The generalization of the recursion formula for reads 

^N{U,j) = f / ,tk,l u; + ZiN_\{tij + LO + ti t j + i + LU_t i+ ij). 

J W + U - k,l 

The large N finite connectivity limit is obtained by letting iV — > oo while 
keeping t^i = tk,iN^ k+l ^ 2 fixed. Defining 

k I 

V(x+,x_) =J2 T n~fT, 
k,i K - l - 

a straightforward adaptation of the argument in section |3.1| leads to the fact 
that the free energy F is the value of 

^F(y, x) = — 1 — logy — x + x_ + yV(x + , 

at the point (y, x + ,x~) where it is maximum : 

x~ + = y(d x _V)(x+,X-.), xL = y(d x+ V)(x + ,xJ), 1 = yV(x + X-). (10) 

The analysis of the first two equations is a bit more involved than the 
analysis of the single implicit equation for the oriented case. We can view 
the pair of equations x + = yd x _V and x_ = yd x+ V in the following way. It 
defines a function y over the curve C in the positive quadrant of the (x+, x_) 
plane given by X-d x _V = x + d x+ V . This curve is smooth as long as V is well- 
defined. For instance, one can take x = x + d x+ V = x^d x __V as an analytic 
parameter on it. Then y is a smooth convex function of x, and all properties 
of the non-oriented case are true for y(x) : y is a good analytic parameter 
on C for small y, but there is a singularity if the convex function y(x) has a 
maximum. To be more explicit, taking differentials we see that 

dx/x \ ( l + x + d 2 x+ V/d x+ V x.d x+ d x _V/d x+ V \ < dx + /x + \ 
dx/x J [ x + d x _d x+ V/d x _V l+x-&$._V/d x _V ) \ dx^jx^ ) ' 

and 

dy/y \ = ( i - x + d x _d x+ V/d x _V -x_dl V/d x _V \ ( dx + /x + \ 
dy/y ) \ -x + d 2 x+ V/d x+ V l-x.d x+ d x V/d x+ V ) \ dx^/x_ J ■ 

A simple computation shows that the determinant of the 2 by 2 matrix in 
the first relation is always strictly positive, but that the determinant of the 2 
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by 2 matrix in the second relation is positive for small y but can change sign. 
This happens if y'(x) vanishes. So the discussion of the non-oriented case 
carries over word for word. It is consistent to write y = y(x),x± = x±(x) 
and x*(x) for the smallest x such that y{x) = y(x). We set x*± = x±(x*). 
There is a percolation cluster if and only if the second determinant is < at 
(y, x + ,x_). This is equivalent to 

(d x _V - x + d x _d x+ V)(d x+ V - x„d x+ d x V) - x + x_d 2 x _Vd 2 x+ V 

being < at that point. Then the fraction of sites in the percolation cluster 
is 

Q 00 = l-yV(x* + ,x*_). 

To get the percolation criterion, we just have to rephrase the vanishing of 
the determinant in terms of the probability distribution 7ik +t k- that a vertex 
of the random graph has k + outgoing and fc_ incoming vertices. The explicit 
formula is 

— fc+ — fc_ 

vr fc+ , fc _ =V T ^-JJ^j- ( n ) 

By construction J2k,i k^k,i = J2k,i ^ n k,i = (k) since any graph has the same 
number of outgoing and incoming edges. The parameters x + and x_ are 
constrained by the relation 

x j^X — = (^) 
The percolation criterion reads: 

((h) - (k+k-)) 2 - (hi - k + }{k 2 _ - k-) < 0. (12) 

Given the distribution Kk,i, the potential V can be reconstructed via 
eq . (|Tl~D . As in the oriented case, the parameter y is an arbitrary normal- 
ization factor. The product of the parameters x± is determined by x + x~- = 
(k). The ratio x + /x_ can be choosen at will since there is a natural in- 
variance in eq. (]T0|) . Namely if x± solves the extremum condition for the 
potential V{x+,xJ), so does x± = \ ±1 x± for the potential W{x + ,xJ) = 
V(Xx + , A _1 x_) and leaves n^i invariant. Again this finds its origin in the 
fact that any graph has the same number of outgoing and incoming edges. 
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